AITopics | right side

Collaborating Authors

right side

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLMAgents

Neural Information Processing SystemsJun-23-2026, 04:08:58 GMT

A major challenge in training VLM agents, compared to LLM agents, is that states shift from simple texts to complex visual observations, which introduces partial observability and demands robust world modeling. We ask: can VLM agents build internal world models through explicit visual state reasoning? In this work, we architecturally enforce and reward VLM agent's reasoning process via reinforcement learning (RL), formulating the problem as a Partially Observable Markov Decision Process (POMDP). We demonstrate that structuring agent's reasoning into StateEstimation("what is the current state?") and TransitionModeling ("what is next?") is critical by studying five reasoning strategies. Investigating how agents should ground visual states and represent these internal beliefs, we reveal the optimal representations are task-dependent: Natural Language excels at capturing semantic relationships for general tasks, while Structured formats are essential for high-precision manipulation. These insights motivate our approach to reward shaping and credit assignment. We leverage a WorldModelingReward to densely rewards the agent's turn-by-turn state predictions, while our Bi-Level General Advantage Estimation (Bi-Level GAE) enables turn-aware credit assignment. Through such world model reasoning, we enable a 3B model to achieve performance of 0.82 on a set of five diverse agent tasks, nearly 3 improvement over its untrained counterpart (0.21) and surpassing proprietary reasoning models like GPT-5 (0.75), Gemini 2.5 Pro (0.67) and Claude 4.5 (0.62). All experiments are supported by our VAGEN framework, a scalable system for training and analyzing multi-turn VLM agents across diverse visual environments.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Bipartite Stochastic Block Models with Tiny Clusters

Stefan Neumann

Neural Information Processing SystemsFeb-14-2026, 02:26:37 GMT

Discovering clusters in bipartite graphs has been researched in many different settings. However, most of these algorithms were heuristics and do not provide theoretical guarantees for the quality oftheir results.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.15)
North America > Canada (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Chaining Mutual Information and Tightening Generalization Bounds

Amir Asadi, Emmanuel Abbe, Sergio Verdu

Neural Information Processing SystemsFeb-13-2026, 16:35:10 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, hypothesis, mutual information, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Modelling and unsupervised learning of symmetric deformable object categories

James Thewlis, Hakan Bilen, Andrea Vedaldi

Neural Information Processing SystemsFeb-12-2026, 09:11:30 GMT

Top: inputimageswiththeaxisof symmetry superimposed (showningreen). Infact,ourmethodbuildson[38]and also learns a dense geometric embedding for objects, however, by using a different supervision principle,symmetry.

artificial intelligence, inproc, symmetry, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Technology: Information Technology > Artificial Intelligence > Vision (0.94)

Add feedback

811d35e47edbb191c19151f3c5f80f53-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 07:19:05 GMT

inequality, monotonically, right side, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

Bipartite Stochastic Block Models with Tiny Clusters

Stefan Neumann

Neural Information Processing SystemsNov-20-2025, 19:11:13 GMT

We study the problem of finding clusters in random bipartite graphs.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Europe > Austria > Vienna (0.14)
North America > United States (0.14)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Chaining Mutual Information and Tightening Generalization Bounds

Amir Asadi, Emmanuel Abbe, Sergio Verdu

Neural Information Processing SystemsNov-20-2025, 18:13:54 GMT

Two important difficulties are (i) exploiting the dependencies between the hypotheses, (ii) exploiting the dependence between the algorithm's input and output.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents

Wang, Kangrui, Zhang, Pingyue, Wang, Zihan, Gao, Yaning, Li, Linjie, Wang, Qineng, Chen, Hanyang, Wan, Chi, Lu, Yiping, Yang, Zhengyuan, Wang, Lijuan, Krishna, Ranjay, Wu, Jiajun, Fei-Fei, Li, Choi, Yejin, Li, Manling

arXiv.org Artificial IntelligenceOct-21-2025

A key challenge in training Vision-Language Model (VLM) agents, compared to Language Model (LLM) agents, lies in the shift from textual states to complex visual observations. This transition introduces partial observability and demands robust world modeling. We ask: Can VLM agents construct internal world models through explicit visual state reasoning? To address this question, we architecturally enforce and reward the agent's reasoning process via reinforcement learning (RL), formulating it as a Partially Observable Markov Decision Process (POMDP). We find that decomposing the agent's reasoning into State Estimation ("what is the current state?") and Transition Modeling ("what comes next?") is critical for success, as demonstrated through five reasoning strategies. Our investigation into how agents represent internal beliefs reveals that the optimal representation is task-dependent: Natural Language excels at capturing semantic relationships in general tasks, while Structured formats are indispensable for precise manipulation and control. Building on these insights, we design a World Modeling Reward that provides dense, turn-level supervision for accurate state prediction, and introduce Bi-Level General Advantage Estimation (Bi-Level GAE) for turn-aware credit assignment. Through this form of visual state reasoning, a 3B-parameter model achieves a score of 0.82 across five diverse agent benchmarks, representing a 3$\times$ improvement over its untrained counterpart (0.21) and outperforming proprietary reasoning models such as GPT-5 (0.75), Gemini 2.5 Pro (0.67) and Claude 4.5 (0.62). All experiments are conducted within our VAGEN framework, a scalable system for training and analyzing multi-turn VLM agents in diverse visual environments. Code and data are publicly available at https://vagen-ai.github.io.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.16907

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

HumanoidVerse: A Versatile Humanoid for Vision-Language Guided Multi-Object Rearrangement

Zhang, Haozhuo, Sun, Jingkai, Caprio, Michele, Tang, Jian, Zhang, Shanghang, Zhang, Qiang, Pan, Wei

arXiv.org Artificial IntelligenceAug-26-2025

We introduce HumanoidVerse, a novel framework for vision-language guided humanoid control that enables a single physically simulated robot to perform long-horizon, multi-object rearrangement tasks across diverse scenes. Unlike prior methods that operate in fixed settings with single-object interactions, our approach supports consecutive manipulation of multiple objects, guided only by natural language instructions and egocentric camera RGB observations. HumanoidVerse is trained via a multi-stage curriculum using a dual-teacher distillation pipeline, enabling fluid transitions between sub-tasks without requiring environment resets. To support this, we construct a large-scale dataset comprising 350 multi-object tasks spanning four room layouts. Extensive experiments in the Isaac Gym simulator demonstrate that our method significantly outperforms prior state-of-the-art in both task success rate and spatial precision, and generalizes well to unseen environments and instructions. Our work represents a key step toward robust, general-purpose humanoid agents capable of executing complex, sequential tasks under real-world sensory constraints. The video visualization results can be found on the project page: https://haozhuo-zhang.github.io/HumanoidVerse-project-page/.

artificial intelligence, instruction, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.16943

Genre: Research Report (0.50)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

right side

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLMAgents

Bipartite Stochastic Block Models with Tiny Clusters

Chaining Mutual Information and Tightening Generalization Bounds

Modelling and unsupervised learning of symmetric deformable object categories

811d35e47edbb191c19151f3c5f80f53-Supplemental-Conference.pdf

Bipartite Stochastic Block Models with Tiny Clusters

Chaining Mutual Information and Tightening Generalization Bounds

VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents

023d0a5671efd29e80b4deef8262e297-Supplemental.pdf

HumanoidVerse: A Versatile Humanoid for Vision-Language Guided Multi-Object Rearrangement